Garbage collection
Table of contents
Tracing GC
Traces which objects are live from a set of roots (local, global variables,
etc...) and garbage collects objects which are not referenced from one of the
root.
Note
At any point in time, the GC should be able to trace all values in the
program. For example, even the parameter inf([], g())
should be kept live
during the call tog()
which means it must explore the program stack where
values (intermediaries) are stored.
Semi-Space
This is an approach which relocates all live object to another space during GC.
Therefore, all dead objects will not relocated which compacts the heap.
A question about relocation
When an object is relocated, all subsequent values in the relocating part should
point to the newly relocated heap object. What is the optimal way to do this?
For example, tinylisp takes the opinionated approach of setting cons car to
UNBOUND and cons cdr to the new object in the old semi-space.
if (car_(v) == UNBOUND)
return cdr_(v);
femtolisp uses a similar trick, by tagging the object with TAG_FWD
and then in
the relocating code:
if (isforwarded(v)) return forwardloc(v);
It means that every allocated object should at least hold two values (one tagged
FWD and the newly relocated value). And this can be confirmed in the
alloc_words
function:
static value_t *alloc_words(int n) {
// ...
n = LLT_ALIGN(n, 2); // only allocate multiples of 2 words
// ...
curheap += (n*sizeof(value_t));
// ...
}
Mark and sweep algorithm
Each object has a flag reserved for the GC.
- Take all roots and explore the reference graph and mark each object as being
'in-use'. This phase is called the mark phase. - In the sweep phase, explore all memory blocks and if they are not marked
'in-use' then they can be freed.
Note
Some runtimes use a combinations of semi-space and free-list for generational
garbage collection. Young generations are kept in a semi-space whereas older
objects stay around in a free-list style heap.
Precise garbage collection
The precise garbage collector will not keep live objects that are actually dead.
This can happen in the case of GC wich heuristically scan the stack for any
values looking like a pointer to the program heap.
Load and Read barriers, safepoints
from the llvm gc docs:
- A load barrier is a piece of code executed immediatly after the machine load
instruction but before any use of the loaded value. - A store barrier is a piece of code which runs before any store.
- A safepoint is a location known to have all values rooted (see note in
Garbage collection > Tracing GC).
You can think of those barriers at a point where all values are markable (on the
stack) and not in some intermediary state as on the program registers. Reads
must go through an indirection if gc could have been triggered since the last
store. Otherwise, the value could have been moved and therefore we have a
use-after-free style error.